50 research outputs found

    Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information

    Full text link
    Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo

    Covariance-based online validation of video tracking

    Full text link
    This paper is a postprint of a paper submitted to and accepted for publication in Electronics Letters and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IEEE Digital LibraryA novel approach is proposed for online evaluation of video tracking without ground-truth data. The temporal evolution of the covariance features is exploited to detect the stability of the tracker output over time. A model validation strategy performs such detection without learning the failure cases of the tracker under evaluation. Then, the tracker performance is estimated by a finite state machine determining whether the tracker is on-target (successful) or not (unsuccessful). The experimental results over a heterogeneous dataset show that the proposed approach outperforms related state-of-the-art approaches in terms of performance and computational cost.This work was supported by the Spanish Government (TEC2011-25995, EventVideo)

    Temporal validation of particle filters for video tracking

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Computer Vision and Image Understanding. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Computer Vision and Image Understanding, 131 (2015) DOI: 10.1016/j.cviu.2014.06.016A novel approach to determine adaptively the temporal consistency of Particle Filters.The proposed method is demonstrated on online performance evaluation of tracking.Temporal consistency is modeled by convolutions of mixtures of Gamma distributions.The proposed method does not need thresholds and can be used on large datasets. We present an approach for determining the temporal consistency of Particle Filters in video tracking based on model validation of their uncertainty over sliding windows. The filter uncertainty is related to the consistency of the dispersion of the filter hypotheses in the state space. We learn an uncertainty model via a mixture of Gamma distributions whose optimum number is selected by modified information-based criteria. The time-accumulated model is estimated as the sequential convolution of the uncertainty model. Model validation is performed by verifying whether the output of the filter belongs to the convolution model through its approximated cumulative density function. Experimental results and comparisons show that the proposed approach improves both precision and recall of competitive approaches such as Gaussian-based online model extraction, bank of Kalman filters and empirical thresholding. We combine the proposed approach with a state-of-the-art online performance estimator for video tracking and show that it improves accuracy compared to the same estimator with manually tuned thresholds while reducing the overall computational cost.This work was partially supported by the Spanish Government (EventVideo, TEC2011-25995) and by the EU Crowded Environments monitoring for Activity Understanding and Recognition (CENTAUR, FP7-PEOPLE-2012-IAPP) project under GA number 324359. Most of the work reported in this paper was done at the Centre for Intelligent Sensing in Queen Mary University of London

    Cost-Aware Coalitions for Collaborative Tracking in Resource-Constrained Camera Networks

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. SanMiguel and A. Cavallaro, "Cost-Aware Coalitions for Collaborative Tracking in Resource-Constrained Camera Networks," in IEEE Sensors Journal, vol. 15, no. 5, pp. 2657-2668, May 2015. doi: 10.1109/JSEN.2014.2367015We propose an approach to create camera coalitions in resource-constrained camera networks and demonstrate it for collaborative target tracking. We cast coalition formation as a decentralized resource allocation process where the best cameras among those viewing a target are assigned to a coalition based on marginal utility theory. A manager is dynamically selected to negotiate with cameras whether they will join the coalition and to coordinate the tracking task. This negotiation is based not only on the utility brought by each camera to the coalition, but also on the associated cost (i.e. additional processing and communication). Experimental results and comparisons using simulations and real data show that the proposed approach outperforms related state-of-the-art methods by improving tracking accuracy in cost-free settings. Moreover, under resource limitations, the proposed approach controls the tradeoff between accuracy and cost, and achieves energy savings with only a minor reduction in accuracy.This work was supported in part by the EU Crowded Environments monitoring for Activity Understanding and Recognition (CEN-TAUR, FP7-PEOPLE-2012-IAPP) Project under GA number 324359, and in part by the Artemis JU and U.K. Technology Strategy Board as part of the Cognitive and Perceptive Cameras (COPCAMS) Project under GA number 332913

    Skin detection by dual maximization of detectors agreement for video monitoring

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Pattern Recognition Letters. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Pattern Recognition Letters, 34, 16 (2013) DOI: 10.1016/j.patrec.2013.07.016This paper presents an approach for skin detection which is able to adapt its parameters to image data captured from video monitoring tasks with a medium field of view. It is composed of two detectors designed to get high and low probable skin pixels (respectively, regions and isolated pixels). Each one is based on thresholding two color channels, which are dynamically selected. Adaptation is based on the agreement maximization framework, whose aim is to find the configuration with the highest similarity between the channel results. Moreover, we improve such framework by learning how detector parameters are related and proposing an agreement function to consider expected skin properties. Finally, both detectors are combined by morphological reconstruction filtering to keep the skin regions whilst removing wrongly detected regions. The proposed approach is evaluated on heterogeneous human activity recognition datasets outperforming the most relevant state-of-the-art approaches.This work has been partially supported by the Spanish Government (TEC2011-25995 EventVideo)

    A semantic-based probabilistic approach for real-time video event recognition

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal Computer Vision and Image Understanding. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal Computer Vision and Image Understanding, 116, 9 (2012) DOI: 10.1016/j.cviu.2012.04.005This paper presents an approach for real-time video event recognition that combines the accuracy and descriptive capabilities of, respectively, probabilistic and semantic approaches. Based on a state-of-art knowledge representation, we define a methodology for building recognition strategies from event descriptions that consider the uncertainty of the low-level analysis. Then, we efficiently organize such strategies for performing the recognition according to the temporal characteristics of events. In particular, we use Bayesian Networks and probabilistically-extended Petri Nets for recognizing, respectively, simple and complex events. For demonstrating the proposed approach, a framework has been implemented for recognizing human-object interactions in the video monitoring domain. The experimental results show that our approach improves the event recognition performance as compared to the widely used deterministic approach.This work has been partially supported by the Spanish Administration agency CDTI (CENIT-VISION 2007- 1007), by the Spanish Government (TEC2011-25995 EventVideo), by the Consejería de Educación of the Comunidad de Madrid and by The European Social Fund

    On the evaluation of background subtraction algorithms without ground-truth

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. San Miguel, and J. M. Martínez, "On the evaluation of background subtraction algorithms without ground-truth" in 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013, 180 - 187In video-surveillance systems, the moving object segmentation stage (commonly based on background subtraction) has to deal with several issues like noise, shadows and multimodal backgrounds. Hence, its failure is inevitable and its automatic evaluation is a desirable requirement for online analysis. In this paper, we propose a hierarchy of existing performance measures not-based on ground-truth for video object segmentation. Then, four measures based on color and motion are selected and examined in detail with different segmentation algorithms and standard test sequences for video object segmentation. Experimental results show that color-based measures perform better than motion-based measures and background multimodality heavily reduces the accuracy of all obtained evaluation results.This work is partially supported by the Spanish Government (TEC2007- 65400 SemanticVideo), by Cátedra Infoglobal-UAM for “Nuevas Tecnologías de video aplicadas a la seguridad”, by the Consejería de Educación of the Comunidad de Madrid and by the European Social Fund

    On the effect of motion segmentation techniques in description based adaptive video transmission

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. San Miguel, and J. M. Martínez, "On the effect of motion segmentation techniques in description based adaptive video transmission", in AVSS '07: Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, 2007, p. 359-364This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect the regions of activity (motion analysis) and to differentiate them from the background, and the corresponding descriptions (mainly MPEG-7 moving regions) are generated together with the textures of the moving regions and the associated background image. Depending on the available bandwidth, different levels of transmission are specified, ranging from just sending the descriptions generated to a transmission with all the associated images corresponding to the moving objects and background. We study the effect of three motion segmentation algorithms in several aspects such as accurate segmentation, size of the descriptions generated, computational efficiency and reconstructed data quality.This work is partially supported by Cátedra Infoglobal-UAM para Nuevas Tecnologías de video aplicadas a la seguridad. This work is also supported by the Ministerio de Ciencia y Tecnología of the Spanish Government under project TIN2004-07860 (MEDUSA) and by the Comunidad de Madrid under project P-TIC-0223-0505 (PROMULTIDIS)

    Shadow detection in video surveillance by maximizing agreement between independent detectors

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. SanMiguel, and J. M. Martínez, "Shadow detection in video surveillance by maximizing agreement between independent detectors", in 16th IEEE International Conference on Image Processing, ICIP 2009. p. 1141-1144This paper starts from the idea of automatically choosing the appropriate thresholds for a shadow detection algorithm. It is based on the maximization of the agreement between two independent shadow detectors without training data. Firstly, this shadow detection algorithm is described and then, it is adapted to analyze video surveillance sequences. Some modifications are introduced to increase its robustness in generic surveillance scenarios and to reduce its overall computational cost (critical in some video surveillance applications). Experimental results show that the proposed modifications increase the detection reliability as compared to some previous shadow detection algorithms and performs considerably well across a variety of multiple surveillance scenarios.Work supported by the Spanish Government (TEC2007- 65400 SemanticVideo), by Cátedra Infoglobal-UAM for “Nuevas Tecnologías de video aplicadas a la seguridad”, by the Spanish Administration agency CDTI (CENIT-VISION 2007-1007), by the Comunidad de Madrid (S-050/TIC-0223 - ProMultiDis), by the Consejería de Educación of the Comunidad de Madrid and by the European Social Fund

    A semantic-guided and self-configurable framework for video analysis

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s00138-011-0397-xThis paper presents a distributed and scalable framework for video analysis that automatically estimates the optimal workflow required for the analysis of different application domains. It integrates several technologies related with data acquisition, visual analysis tools, communication protocols, and data storage. Moreover, hierarchical semantic representations are included in the framework to describe the application domain, the analysis capabilities, and the user preferences. The automatic determination of the analysis workflow is performed by selecting the most appropriate tools for each domain among the available ones in the framework by means of exploiting the relations between the semantic descriptions. The experimental results in the video surveillance domain demonstrate that the proposed approach successfully composes optimal workflows for video analysis applications.This work has been partially supported by the Spanish Government (TEC2011-25995), by the Consejería de Educación of the Comunidad de Madrid and by The European Social Fund
    corecore